Enable TransformerEngine-backed Tensor Parallelism with Llama3. by cspades · Pull Request #1483 · NVIDIA/bionemo-framework

cspades · 2026-02-27T16:01:00Z

Description

Add tensor parallelism to BioNeMo-Recipes Llama3.
Requires TransformerEngine DTensor / DCP support: Add DCP compatibility for FSDP2-TP sharding in TransformerEngine. TransformerEngine#2713

Usage

torchrun --nproc-per-node 8 train_fsdp2_nd_parallel.py

TODO: Add code snippet

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.

ciflow:skip - Skip all CI tests for this PR
ciflow:notebooks - Run Jupyter notebooks execution tests for bionemo2
ciflow:slow - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2
ciflow:all - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2.
ciflow:all-recipes - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes.

Unit tests marked as @pytest.mark.multi_gpu or @pytest.mark.distributed are not run in the PR pipeline.

For more details, see CONTRIBUTING

Note

By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
/ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

@coderabbitai review - Triggers a standard review
@coderabbitai full review - Triggers a comprehensive review

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

copy-pr-bot · 2026-02-27T16:01:04Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-27T16:01:10Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 205b4c24-b380-4cd9-abbb-47b79d57a7c6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Cory Ye <cye@nvidia.com>

cspades self-assigned this Feb 27, 2026

cspades mentioned this pull request Feb 27, 2026

Add DCP compatibility for FSDP2-TP sharding in TransformerEngine. NVIDIA/TransformerEngine#2713

Open

13 tasks

cspades force-pushed the cye/llama3-te-tp branch 2 times, most recently from cd5fd20 to 3fe6445 Compare February 27, 2026 16:12

Enable TransformerEngine-backed Tensor Parallelism with Llama3.

0f4055d

Signed-off-by: Cory Ye <cye@nvidia.com>

cspades force-pushed the cye/llama3-te-tp branch from 3fe6445 to 0f4055d Compare March 5, 2026 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable TransformerEngine-backed Tensor Parallelism with Llama3.#1483

Enable TransformerEngine-backed Tensor Parallelism with Llama3.#1483
cspades wants to merge 1 commit intoNVIDIA:mainfrom
cspades:cye/llama3-te-tp

cspades commented Feb 27, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cspades commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage

Type of changes

CI Pipeline Configuration

Authorizing CI Runs

Triggering Code Rabbit AI Review

Pre-submit Checklist

Uh oh!

copy-pr-bot bot commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cspades commented Feb 27, 2026 •

edited

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading